Stephen Smith's Blog

Musings on Machine Learning…

Risc-V Assembly Language Hello World

with 22 comments


Introduction

Last time, we started talking about the Risc-V CPU. We looked at some background and now we are going to start to look at its Assembly Language. We’ll write a program to print “Hello World!” to the terminal window, cross-compile it with GCC and run it in a Risc-V emulator. This program lets us start discussing some features of the core Risc-V instruction set. Risc-V supports 32-bit, 64-bit or 128-bit implementations, here we’ll run using 64-bits.

We’ll start with the program, then discuss various aspects of the Assembly instructions it uses and finally discuss how to build and run the program.

Hello World

First let’s present the program and then we’ll discuss it. This program works by making Linux system calls and like all Linux programs starts execution at the globally exported _start label. The program uses the Assembly directives specified in the GCC documentation.

#
# Risc-V Assembler program to print "Hello World!"
# to stdout.
#
# a0-a2 - parameters to linux function services
# a7 - linux function number
#

.global _start      # Provide program starting address to linker

# Setup the parameters to print hello world
# and then call Linux to do it.

_start: addi  a0, x0, 1      # 1 = StdOut
        la    a1, helloworld # load address of helloworld
        addi  a2, x0, 13     # length of our string
        addi  a7, x0, 64     # linux write system call
        ecall                # Call linux to output the string

# Setup the parameters to exit the program
# and then call Linux to do it.

        addi    a0, x0, 0   # Use 0 return code
        addi    a7, x0, 93  # Service command code 93 terminates
        ecall               # Call linux to terminate the program

.data
helloworld:      .ascii "Hello World!\n"

The ‘#’ character is the comment character and anything after it on a line is a comment.

Registers

The Risc-V processor has 32 registers labeled x0 to x31 and a program counter (PC). x0 is a zero register, and x1-x31 can be used by programs as they wish. If you look at our listing for Hello World, you will notice that we are using registers a0, a1, a2 and a7. What are these? Since the Risc-V architecture provides no standards for register usage, and typical Assembly language programming requires a stack pointer, subroutine return register and some sort of function calling convention, these are defined in an Application Binary Interface (ABI). This is a software standard that the operating system defines so that programs and libraries can work together properly. Here GCC knows about the Risc-V Linux ABI where register usage is defined as:

 

Register ABI Use by convention Preserved?
x0 zero hardwired to 0, ignores writes n/a
x1 ra return address for jumps no
x2 sp stack pointer yes
x3 gp global pointer n/a
x4 tp thread pointer n/a
x5 t0 temporary register 0 no
x6 t1 temporary register 1 no
x7 t2 temporary register 2 no
x8 s0 or fp saved register 0 or frame pointer yes
x9 s1 saved register 1 yes
x10 a0 return value or function argument 0 no
x11 a1 return value or function argument 1 no
x12 a2 function argument 2 no
x13 a3 function argument 3 no
x14 a4 function argument 4 no
x15 a5 function argument 5 no
x16 a6 function argument 6 no
x17 a7 function argument 7 no
x18 s2 saved register 2 yes
x19 s3 saved register 3 yes
x20 s4 saved register 4 yes
x21 s5 saved register 5 yes
x22 s6 saved register 6 yes
x23 s7 saved register 7 yes
x24 s8 saved register 8 yes
x25 s9 saved register 9 yes
x26 s10 saved register 10 yes
x27 s11 saved register 11 yes
x28 t3 temporary register 3 no
x29 t4 temporary register 4 no
x30 t5 temporary register 5 no
x31 t6 temporary register 6 no
pc (none) program counter n/a

 

Which was taken from here. A0 to a7 are the registers used to pass function parameters (arguments), and a7 is used for Linux system calls where you specify the Linux function number from unistd.h.

Instructions

We only use three Assembly instructions in this program: LA, ADDI and ECALL. Risc-V works hard to define as few instructions as possible. As a result some instructions have multiple uses. For instance ADDI is add an intermediate to a register, which is of the form:

      ADDI RD, RS, imm

Where RD is the destination register, RS the source register and imm is a 12-bit immediate value. Instructions are 32-bits in length so the size of the immediate value tends to be whatever is leftover after setting the opcode and any required registers.

You can define a NOP instruction with:

      ADDI x0, x0, 0

Or load immediate with:

      ADDI RD, X0, imm

The Assembler will take opcodes like NOP or LI (Load Immediate) and translate them into the correct underlying instruction. Here we used ADDI, but when we decompile the compiled program we’ll see the decompiler uses these aliases. These do make your program more readable. All our ADDI instructions use the LI pattern.

Risc-V provides a separate opcode to call the operating system. This is the ECALL instruction. When calling Linux, A7 is the Linux service number and A0 to A6 contain any parameters. When calling write, we need the file descriptor (1 for stdout), the string to write and the length in bytes to write, which we put in registers A0, A1 and A2. The return code which we don’t check will be in A0. This differs from most other architectures that use the interrupt mechanism for this purpose. The Risc-V designers feel it is cleaner to separate operating system calls from interrupts, even though both cause kernel privileged instructions to execute.

The remaining instruction is LA, which isn’t a Risc-V instruction, but rather it tells the Assembler that we want to load an address into a register. Then we leave it up to the Assembler to figure out how to do this. If we are running with 64-bit addressing then this address will be 64-bits. We can’t load this with a single load immediate instruction since the biggest immediate value is 20-bits, with most smaller. This means to load the address we either need to do many instructions to load this address piece by piece using load immediates, shifts, logical operations and/or arithmetic operations. The Assembler has inside knowledge of the value of this address, so it can, say use PC relative addressing to load this address. There are a lot of tricks to deal with 64-bit values from 32-bit instructions, that we don’t have room to go into now, but perhaps in a future blog article.

Building

I don’t have a Risc-V processor, so I built the program using cross-compilation. The instructions on installing the GCC tools for this on a Debian based Linux are here. Then to build you run:

riscv64-linux-gnu-as -march=rv64imac -o HelloWorld.o HelloWorld.s
riscv64-linux-gnu-ld -o HelloWorld HelloWorld.o

We can run a Risc-V objdump to see what was produced with:

riscv64-linux-gnu-objdump -d HelloWorld

And get:

HelloWorld:     file format elf64-littleriscv

Disassembly of section .text:

00000000000100b0 <_start>:
   100b0: 00100513           li a0,1
   100b4: 00001597           auipc a1,0x1
   100b8: 02058593           addi a1,a1,32 # 110d4 <__DATA_BEGIN__>
   100bc: 00d00613           li a2,13
   100c0: 04000893           li a7,64
   100c4: 00000073           ecall
   100c8: 00000513           li a0,0
   100cc: 05d00893           li a7,93
   100d0: 00000073           ecall

We see it has interpreted the ADDI instructions that are just loading an immediate as LI. 

The “LA a1, helloworld” directive has been compiled to:

   100b4: 00001597           auipc a1,0x1
   100b8: 02058593           addi a1,a1,32 # 110d4 <__DATA_BEGIN__>

AUIPC is add immediate to PC, so it put PC+1 into A1 then the ADDI adds the offset to the beginning of the data section. Actually the Assembler set these as needing relocation and then the constants were filled in by the linker in the LD command. The good thing is that the Assembler and Linker took care of these details so we didn’t need to. Loading addresses and large integers is always a challenge in RISC processors.

Running

Now I have our HelloWorld executable on my Intel i3 laptop running Ubuntu Linux. To run it, I use the TinyEMU Risc-V emulator. There are instructions on running a mini version of Linux under the emulator, you can then mount your /tmp folder and copy the executable over. Then it runs.

The whole process is:

stephen@stephenubuntu:~/riscV/HelloWorld$ bash -x ./build
+ riscv64-linux-gnu-as -march=rv64imac -o HelloWorld.o HelloWorld.s
+ riscv64-linux-gnu-ld -o HelloWorld HelloWorld.o
stephen@stephenubuntu:~/riscV/HelloWorld$ cp HelloWorld /tmp
stephen@stephenubuntu:~/riscV/HelloWorld$ cd ../../Downloads/diskimage-linux-riscv-2018-09-23/
stephen@stephenubuntu:~/Downloads/diskimage-linux-riscv-2018-09-23$ temu root_9p-riscv64.cfg 
[    0.307640] NET: Registered protocol family 17
[    0.308079] 9pnet: Installing 9P2000 support
[    0.311914] EXT4-fs (vda): couldn't mount as ext3 due to feature incompatibilities
[    0.312757] EXT4-fs (vda): mounting ext2 file system using the ext4 subsystem
[    0.325269] EXT4-fs (vda): mounted filesystem without journal. Opts: (null)
[    0.325552] VFS: Mounted root (ext2 filesystem) on device 254:0.
[    0.326420] devtmpfs: mounted
[    0.326785] Freeing unused kernel memory: 80K
[    0.326949] This architecture does not have kernel memory protection.
~ # mount -t 9p /dev/root /mnt
~ # cp /mnt/HelloWorld .
~ # ./HelloWorld 
Hello World!
~ # 

Note: I had to add:

      kernel: "kernel-riscv64.bin",

To root_9p-riscv64.cfg in order for it to start properly.

Summary

This simple Hello World program showed us a basic Risc-V Assembly Language program that loads some registers and calls Linux to print a string and then exit. This was still a long blog posting since we needed to explain all the Assembly elements and then how to build and run the program without requiring any Risc-V hardware.

Written by smist08

September 7, 2019 at 10:38 pm

Posted in RiscV

Tagged with , , , ,

22 Responses

Subscribe to comments with RSS.

  1. what does the addi a7, x0, 93 and ecall do? How do they end the program and why?

    Great post ^^

    Darío425

    December 8, 2019 at 7:19 am

    • This adds 93 to the zero register and puts the result in register a7. 93 is the Linux service call number to terminate the program. Then ecall, does a call to the Linux kernel, switching to kernel space and kernel priviledge level. Then Linux terminates the program.

      smist08

      December 8, 2019 at 3:03 pm

      • Ive been using the same order but with number 10 in a RISC-V processor and I didn’t know why

        Thank you youre in my presentation’s bibliography now ❤️

        Darío425

        December 8, 2019 at 3:08 pm

      • Beware that the numbers are different between 32-bit Linux and 64-bit Linux. Check /usr/include/asm-generic/unistd.h for the correct values for your system.

        smist08

        December 8, 2019 at 3:12 pm

  2. Is there any way to map my part of assembly code at perticular location?

    Somesh

    March 24, 2020 at 8:30 pm

  3. […] blogged on RISC-V processors a couple of times, this is an open source hardware specification so you can develop a processor without paying […]

  4. “AUIPC is add immediate to PC, so it put PC+1 into A1”

    Actually it adds 0x1000. (Add Upper Immediate to PC)

    Zolee

    July 16, 2020 at 1:06 am

  5. […] past week, my blog articles on RISC-V have spiked in readership. I can only assume that the pending sale of ARM has stimulated […]

  6. Note: I had to add: kernel: “kernel-riscv64.bin”, that was helpful, thank you

    Leo

    February 11, 2022 at 8:55 am

  7. Hi, its very interesting!! I just want to run a asm code in a RISC-V spike simulator, can you help me to proceed

    Sree

    March 31, 2022 at 11:40 am

  8. Any chance of adapting this for those who have a risk v processor? I am super keen to start real world risk v assembler use in my VisionStar V1 risc v single board computer running Fedora. see here https://shop.allnetchina.cn/collections/starfive/products/starfive-visionfive-ai-single-board-computer. I am a novice…..

    adingbatponder

    April 9, 2022 at 8:30 am

    • That board looks interesting. Too bad its sold out. I wonder when they’ll make some more?

      smist08

      April 9, 2022 at 8:37 am

      • no longer sold out !

        adingbatponder

        April 18, 2022 at 5:49 am

  9. Very nice, thanks for this article! Was able to get this working on my RISC-V board, the VisionFive. Don’t have to change anything except the obvious, the two commands become “as -o HelloWorld.o HelloWorld.s” and “ld -o HelloWorld HelloWorld.o”.

    amihart

    May 18, 2022 at 7:38 pm

  10. […] in September, 2019, I posted an article presenting a RISC-V Assembly Language “Hello World” program which I ran using the TinyEMU system […]

  11. I get stuck when I add a second string to , data ( helloagain: .ascii “Hello Again!n” which I get a crash with if I try and output after the helloworld string. Any tips on outputting multiple strings? Thank you.

    Rich

    March 31, 2024 at 5:25 am

  12. I get stuck when I add a second string to , data ( helloagain: .ascii “Hello Again!n” which I get a crash with if I try and output after the helloworld string. Any tips on outputting multiple strings? Thank you.

    Rich

    March 31, 2024 at 5:25 am

    • Should be ok. Check the string length put into a2, if this is too big, the program will crash. Note you have to reset this as the value will be overwritten in the first call.

      smist08

      March 31, 2024 at 4:18 pm

      • This is the code:

        .global _start # Provide program starting address to linker

        # Setup the parameters to print hello world

        # and then call Linux to do it.

        _start: addi a0, x0, 1   # 1 = StdOut

             la a1, helloworld # load address of helloworld

             addi a2, x0, 13  # length of our string

            addi a7, x0, 64   # linux write system call

            ecall       # Call linux to output the string

            addi a0, x0, 1   # 1 = StdOut

            la a1, helloagain  # load address of helloagain

            addi a2, x0, 13   # length of our string

            addi a7, x0, 64   # linux write system call

            ecall       # Call linux to output the string

        # Setup the parameters to exit the program

        # and then call Linux to do it.

            addi a0, x0, 0    # Use 0 return code

            addi a7, x0, 93   # Service command code 93 terminates

            ecall       # Call linux to terminate the program

        .data

        helloworld: .ascii “Hello World!n”

        helloagain: .ascii “Hello Again!n”

        ##############

        But when I run it it prints out Hello World! and then gives a segfalt. I have tried with Spike, QEMU and a Milk-V Pioneer, all fail:

        ./hello
        Hello World!
        z 0000000000000000 ra ffffffc000001dc2 sp ffffffc00041bc40 gp 0000000000000000
        tp 0000000000000000 t0 0000003ffffffb50 t1 ffffffbf80000000 t2 0000000000000000
        s0 fffffffffffff80d s1 000000000000000d a0 ffffffc00041bc70 a1 fffffffffffff80d
        a2 ffffffc00041bc7d a3 fffffffffffff81a a4 0000000000000000 a5 ffffffc00041bc70
        a6 0000000000000040 a7 ffffffc0000024d4 s2 8000000200006620 s3 ffffffc00041bc70
        s4 000000000000000d s5 ffffffc00000c008 s6 0000000000000200 s7 0000000000000000
        s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
        t3 0000000000000008 t4 0000000000000000 t5 0000000000000000 t6 0000000000000000
        pc ffffffc000009344 va/inst fffffffffffff80d sr 8000000200046700
        Kernel load segfault @ 0xfffffffffffff80d

        Rich

        April 1, 2024 at 9:41 am

      • Hi Rich,
        Annoying RISC-V thing, that you need to assemble with as -mno-relax -o rich.o rich.S, then ld -o rich rich.o. If you don’t specify -mno-relax then the assembler uses an offset from the global pointer that isn’t setup. You could also setup the global pointer, which is done if you link in the C runtime.

        smist08

        April 1, 2024 at 1:06 pm

      • the back slashes seem to vanish for the new line char when I posted this reply.

        Rich

        April 1, 2024 at 9:43 am

      • Thank you for the -mno-relax flag, been trying to look for faults in my code. thanks again.

        Rich

        April 1, 2024 at 2:20 pm


Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.